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Abstract 

Background: Epithelial ovarian cancer is characterized by multiple genomic alterations; most are passenger 
alterations which do not confer tumor growth. Like many cancers, it is a heterogeneous disease and can be broadly 
categorized into 4 main histotypes of clear cell, endometrioid, mucinous, and serous. To date, histotype-specific 
copy number alterations have been difficult to elucidate. The difficulty lies in having sufficient sample size in each 
histotype for statistical analyses. 

Methods: To dissect the heterogeneity of ovarian cancer and identify histotype-specific alterations, we used an 
in silico hypothesis-driven approach on multiple datasets of epithelial ovarian cancer. 

Results: In concordance with previous studies on global copy number alterations landscape, the study showed 
similar alterations. However, when the landscape was de-convoluted into histotypes, distinct alterations were 
observed. We report here significant histotype-specific copy number alterations in ovarian cancer and showed that 
there is genomic diversity amongst the histotypes. 76 cancer genes were found to be significantly altered with 
several as potential copy number drivers, including ERBB2 in mucinous, and TPMS in endometrioid histotypes. 
ERBB2 was found to have preferential alterations, where it was amplified in mucinous (28.6%) but deleted in serous 
tumors (15.1%). Validation of ERBB2 expression showed significant correlation with microarray data (p=0.007). There 
also appeared to be reciprocal relationship between KRAS mutation and copy number alterations. In mucinous 
tumors where KRAS mutation is common, the gene was not significantly altered. However, KRAS was significantly 
amplified in serous tumors where mutations are rare in high grade tumors. 

Conclusions: The study demonstrates that the copy number landscape is specific to the histotypes and 
identification of these alterations can pave the way for targeted drug therapy specific to the histotypes. 
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Background 

Ovarian cancer is often dubbed a silent' killer because of 
its non-specific symptoms and late clinical onset which 
contribute to overall poor prognosis. There has been a 
steady increase in incidence over the last three decades 
with 204,000 new cases diagnosed each year globally [1]. 
It ranked fifth in mortality among cancers in women 
and has the highest case-fatality rate in gynecological 
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cancers [2]. The 5-year survival rate for women with 
advanced disease remains at 29% with estimated 125,000 
deaths annually [1,3]. 

About 90% ovarian cancers are epithelial ovarian cancer 
(EOC) [4], histologically sub typed as serous, mucinous, 
endometrioid, or clear cell Further subtyping include the 
borderline cases, such as mucinous or serous borderline, 
often presented as stable diseases with more favorable out- 
come compared to the non-borderline subtypes. It is now 
recognized that epithelial ovarian cancer is a spectrum of 
diseases with varied genetic mutations among histotypes 
[5]. Genetically, mutations differ between the grades of the 
disease. Low-grade serous carcinoma have high frequency 
of KRAS and BRAF mutations but few p53 mutations 
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while high grade serous carcinoma shows the inverse in 
frequency of these mutations [6,7]. Mucinous histotype 
has KRAS mutations [8] and endometrioid has PTEN 
mutations [9]. Despite the molecular heterogeneity, the 
treatment standard remains as taxane/platinum-based 
chemotherapy for all histotypes. 

Genomic alterations such as copy number alterations 
(CNA) have been known to harbor drivers in carcino- 
genesis. Driver genes are genes that confer growth ad- 
vantage on the cancer cells [10]. Several known copy 
number alteration drivers in cancers include receptor 
tyrosine kinases such as EGFR, FGFR, and ERBB2, which 
are targets for drugs therapy [11]. Successful incorpor- 
ation of genomics alterations studies in cancer treatment 
has been evident in breast, leukemia, and lung cancers, 
where targeted therapies are part of the standard treat- 
ment protocols [12-14]. For example, Trastuzumab, a 
targeted therapy that can significantly reduce risk of dis- 
ease recurrence and improve overall survival, is now 
standard of care for early-stage patients with Her2- 
positive breast cancer [14]. To date, targeted drug ther- 
apy has not been successfully incorporated in EOC. 

One of the challenges in elucidating copy number 
alterations in EOC is the disproportional prevalence of 
histotype. Serous is the most prevalent (70-85%), fol- 
lowed by endometrioid (5-10%), clear cell (5-10%), and 
mucinous (-5%). The lower prevalent diseases tend to 
suffer from small sample size for statistical analyses to 
identify copy number alterations. To elucidate CNA in 
each histotype, we combined data from multiple studies 
with similar platforms to identify copy number altera- 
tions that are specific to the histotypes. The motivation 
is to identify high confidence histotype-specific altera- 
tions that may otherwise be obscured due to dispropor- 
tionate prevalence of the histotypes. 

Results 

Two major results are presented here; the histotype- 
specific copy number alterations for EOC and the identi- 
fication of potential driver genes. Three datasets with 
corresponding gene expression and copy number profil- 
ing on similar platforms were used for this study. A 
hypothesis-driven approach using stringent false discov- 
ery rate (FDR) filtering was used to identify potential 
copy number driver genes (Methods). The genes were 
compared with other studies as well as cancer genes 
reported in literature. In addition, we also validated the 
expression of ERBB2, a driver gene, via quantitative real- 
time PGR (qPCR). 

Distinct copy number alterations in EOC histotypes 

Figure la shows the global frequency of copy number 
aberrations for EOC across the genome for the merged 
dataset. In concordance with other reports, the four 



commonly reported chromosomes of 3q, 8, 17p, and 20q 
showed broad copy number alterations in EOC [15-17]. 
When this was de-convoluted into histotypes (Figure lb, 
shown in 4 tracks for clear cell, endometrioid, mucinous, 
and serous), distinct differences were observed. It is evi- 
dent that the general frequency for EOC was mirrored 
in serous tumors, not surprising since it has the largest 
sample size. In the four most commonly reported altered 
regions, 3q, 8, and 20q amplifications were observed in 
serous, clear cell, endometrioid but not mucinous tumors. 
Endometrioid and serous tend to harbor more copy num- 
ber alterations, with more broad regions of alterations in- 
volving the p- or q-arm. The genomics landscape for clear 
cell and mucinous tumors appeared different from the 
other histotypes, with lesser broad regions of alterations 
and in lower frequency. 

To assess the significance of copy number altered 
regions, we used a 2-pronged approach using merged 
and individual datasets (Additional file 1: Figure SI). 
Figure 2 shows significant copy number altered regions 
(see Methods) in the histotypes. Broad 3p amplification 
and 8p deletion were observed in serous tumors; 8q 
amplifications in clear cell and serous tumors; 17p de- 
letions in mucinous and serous tumors; and chr20 amplifi- 
cations in serous tumors. The nature of the alterations 
also differs, e.g. focal versus broad alterations. For example 
in 3q, serous tumors showed broader amplifications in the 
region of 3ql3.31-29 while focal amplification was 
observed for clear cell tumors at 3q26.2-26.32. This is 
interesting as it has been reported that overlapping broad 
and focal aberrations can have distinct functional conse- 
quences [18]. In other chromosomes, alterations were spe- 
cific to histotypes as well; such as 9p21 focal deletions in 
mucinous histotype, reportedly harboring homozygous 
deletions in EOC [19-22]. There are regions that displayed 
opposite trend in alterations between histotypes. One par- 
ticular region, 8p23.1, showed amplification in clear cell 
but deletion in serous tumors. Another region which 
showed opposite trend in alteration was 17ql2 which har- 
bor the oncogene ERBB2; the gene was significantly ampli- 
fied in mucinous but deleted in serous tumors. Excluding 
borderline cases, 28.6% (4/14) mucinous samples had 
amplifications and 15.2% (15/99) of serous tumors had 
deletions of ERBB2. 1/5 mucinous borderline also showed 
ERBB2 amplification. It should be of note that although 
ERBB2 was found significantly deleted in serous tumors, 
5/99 (5.1%) of serous samples harbored the amplification, 
close to the 3% reported by TCGA for high grade serous 
[23]. No ERBB2 deletion was observed in the mucinous 
samples. As serous tumors have a comparatively larger 
sample size than the other histotypes, we would expect 
more significant regions for this histotype. Nevertheless, 
using stringent criteria, we were able to identify some sig- 
nificant CNA for the lower prevalent tumors. 
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Figure 1 Overview of copy number aberrations in EOC from 3 datasets from chromosomes 1-X. Chromosomes are shown in alternating 
blocl<s of grey. The centromere for each chromosome is shown as a dotted line, (a) Frequency (%) of occurrence of amplification (red) and 
deletion (blue) from chromosome 1 to X in epithelial ovarian cancer from the merged 3 datasets. Major regions of alterations reported in other 
studies are similarly observed: e.g. chr 3, 8, 17, and 20. (b) Frequency (%) of occurrence in the 4 main histotypes of EOC. Threshold for frequency 
was set at LRR > |0.2|. The frequency for serous tumors is similar to that in (a). However, frequency for the lower prevalent histotypes showed 
evident differences, indicating the molecular differences between the histotypes. 



To quantify genes that were altered in each histotype, 
we mapped genes to regions that were identified in each 
histotype. 6375 unique genes were found to be altered: 
2682 amplifications and 3712 deletions (Additional file 2: 
Table SI). 91% of genes were amplified and 97.1% deleted 
in serous tumors, 19.1% amplified and 1.5% deleted in 
clear cell, 14.3% amplified in endometrioid, and 0.5% amp- 
lified and 11.5% deleted in mucinous. A total of 5360 
genes were specific to each histotype (amplification=2014, 
deletion =3346), 5011 in serous tumors, 193 in endome- 
trioid tumors, 79 in clear cell tumors, and 77 in mucinous 
tumors. Within each histotype, the type of alterations 



varied. Clear cell tumors had more amplified genes than 
deleted genes, while mucinous and serous tumors had 
more deleted genes than amplified genes. Only amplified 
genes were found in endometrioid tumors. Figure 3 shows 
the Venn diagram of overlapping genes in the lower preva- 
lent histotypes. Due to the differences in sample size of 
each histotype, comparisons between overlapped genes 
were limited to genes found only in the non- serous 
tumors where the sample sizes are more comparable. A 
small number of overlapped amplified and deleted genes 
between clear cell and mucinous tumors were observed; 
none with endometrioid. This suggests that most of the 
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Figure 2 Significance of copy number alterations (green) in clear cell, endometrioid, mucinous, and serous histotypes are shown in 4 
horizontal tracks respectively. GISTIC q-values (y-axis) are shown as -logio. Regions that passed the statistical selection criteria (see Methods) 
are shown as: amplification (red) and deletion (blue). Some known cancer and putative genes are indicated at the bottom of the plots. ERBB2 in 
chrl7 is shown to be amplified in the mucinous track. Note scale difference in y-axis for the histotypes. 



CNA are specific to histotypes, adding to the mounting 
evidence of molecular differences between histotypes. 
Despite larger sample size in serous histotype, comparison 
of CNA found in the lower prevalent histotypes with ser- 
ous is also of interest (Additional file 3: Figure S2). Clear 
cell tumors had the highest number of common altered 
genes with serous tumors (7.6%) while endometrioid 
tumors had the lowest number of common altered genes; 
3.0% common altered genes with serous tumors only. 
There is also more overlap on amplifications than 
deletions. 

Consistency with other studies 

Three studies have reported copy number changes in 
EOC [16,17,24] (Additional file 4: Table S2). These stud- 
ies focused on global changes in EOC rather than 
histotype-specific alterations. Nevertheless, histotype- 
specific alterations from our study should overlap to 
some degree with these reported genes. We compiled a 



list of 551 significant genes reported in the 3 studies of 
which 545 genes were reported by either Haverty et al. 
or Gorringe et al. Eleven genes (2%) were commonly 
identified in at least 2 of the studies. In comparison, 
39.2% (216/551) of genes were found in our study, indi- 
cating that our approach can identify copy number 
altered genes. We also compared our findings in serous 
tumors with TCGA high grade serous study (n=489) [23] 
where they reported 63 regions of gains and 50 regions 
of deletion. Based on overlapping of genes (if available) 
or genomic regions, 29/63 (46%) amplified and 27/50 
(54%) deleted regions were also found in our study 
(n=101) (Additional file 5: Table S3). 

Copy number alterations in known cancer genes 

300 cancer genes were previously reported [25], of which 
76 cancer genes were found within the altered regions 
(Table 1). We found that copy number alterations of 
these cancer genes were specific to histotypes as well; 
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e.g. TPM3 amplification in endometrioid tumors; JAK2 
deletion in mucinous; RBI deletion in both clear cell 
and serous tumors; TP53 and MAP2K4 deletion in mu- 
cinous and serous tumors. ERBB2, a gene implicated in 
breast and EOC showed significant focal amplification in 
mucinous tumors but deletions in serous tumors. Evalu- 
ation of ERBB2 expression between mucinous and ser- 
ous tumors in the 3 datasets showed the trend of over 
expression of ERBB2 in mucinous compared to serous 
(Additional file 6: Table S4). The focal amplification of 
ERBB2 has been observed in various studies [26-28], 
supporting our findings. 

Identification of candidate driver genes in EOC histotypes 

To identif)^ candidate driver genes that might contribute 
to carcinogenesis of EOC, we looked for genes that 
showed association between copy number and gene ex- 
pression (Methods, Additional file 1: Figure SI). Table 1 
summarized the alterations and potential cancer driver 
genes (highlighted in bold) based on cytoband. Pathway 
analysis of potential driver genes in Table 1 showed top 
molecular functions of these genes to be involved with cell 
cycle, cellular development, growth and proliferation. 

Candidate drivers in l<nown cancer genes 

Among 76 identified cancer genes listed in Table 1, 
several genes were potential drivers (highlighted in bold) 
in EOC in the association analysis. These include: FH, 
GMPS, PIK3CA, EIF4A2, ZNF384, and SS18L1 (amplifi- 
cations in serous tumors); TET2, FGFRIOP, NFl, ERBB2, 
and SH3GL1 (deletions in serous tumors) and ERBB2 
(amplifications in mucinous tumors). Figure 4 shows the 
association between copy number and gene expression 
of ERBB2. The correlation was significant for all 3 data- 
sets (Datasetl: R=0.80, p=3.47E-09, Dataset2: R=0.74, 



p=9.44E-6, Dataset3: R=0.79, p=1.14E-6, meta-p=3.90E- 
17), suggesting a driver mechanism. Amplification was 
observed in mucinous and deletion in serous tumors. An- 
other interesting observation was MYC, TP53, KRAS, and 
BRCAl, genes reportedly to be commonly mutated in 
cancers but did not show significant association between 
copy number and gene expression. Similarly, two other 
genes reported to be mutated in EOC (PIK3R1 and 
STKll) also did not show potential driver mechanism. 

Validation of ERBB2 expression 

It is not within the scope of this study to validate the 
candidate copy number driver genes. As ERBB2 has po- 
tential for targeted therapy, we validated the expressions 
of ERBB2 via qPCR of 7 samples in Datasetl that were 
found to be amplified or deleted. It should be noted that 
no more samples were available in Dataset2 for valid- 
ation. Figure 5 shows the scatter plot between gene ex- 
pression from microarray and qPCR (measured as fold- 
change). Significant correlation was observed (p =0.007) 
between microarray and qPCR. Four ERBB2 amplified 
samples (1 serous, 1 mucinous, 1 mucinous borderline, 
and 1 clear cell) were expressed correspondingly higher 
than the 3 serous samples with deleted ERBB2 (p=0.06, 
Wilcoxon). All these data support ERBB2 as a copy 
number driver gene in EOC. 

Discussion 

The study reveals genomics diversity in EOC. It is con- 
ceivable that some of these alterations are involved in 
the tumorigenesis of EOC but the pathogenesis is likely 
regulated by aberrations of histotype specific alterations. 
By stratifying based on histotypes, we were able to iden- 
tify alterations in the lower prevalent clear cell, endome- 
trioid, and mucinous samples. An example is ERBB2. 
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Table 1 Summary of copy number alterations and potential driver cancer genes 
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For simplicity, the altered regions are summarized into cytobands in the first column. Second (AMP) and third (DEL) columns indicate the amplification and deletion 



respectively for each histotype (C-clear cell, E-endometrioid, M-mucinous, S-serous). The amplification (A) and deletion (D) status for each histotype are indicated in each 
cytoband. The last column shows known cancer genes (from Sanger COSMIC database) that are within the altered regions. In total, 76 genes are listed in this table. 
Cancer genes which are potential drivers (i.e. significant correlation between gene expression and copy number alterations) are highlighted in bold. 



Several groups have investigated alterations of ERBB2 in 
EOC with mixed results [29,30]. However, when strati- 
fied into histotypes, the high prevalence of ERBB2 amp- 
lification in mucinous is clearly evident in our results 
and other studies [26-28]. In addition, our results sup- 
port ERBB2 as a potential copy number driver gene in 
EOC. The differences of ERBB2 copy number alterations 
amongst the histotypes could be due to the origin of his- 
totypes in EOC. Our study demonstrates the importance 
of histotype-specific analyses where the differing copy 
number landscape amongst the histotypes adds to the 
mounting evidence that EOC should not be treated as 
one disease. 



76 cancer genes listed in Table 1 were found to be 
copy number altered in EOC. They include ERBB2, 
TPM3, BRCAl, BRAF, KRAS, and PIK3CA; some of 
which are potential copy number drivers e.g. PIK3CA 
and BRAF in serous histotypes. Another interesting ob- 
servation was KRAS, a gene reported to be mutated in 
mucinous tumors. In our study, KRAS was not found 
significantly altered in mucinous tumors where muta- 
tions are common. However, KRAS was significantly 
amplified in serous tumors where mutations are rare in 
high grade tumors. The reciprocal relationship between 
KRAS mutation and copy number alterations is also 
observed in gastric cancer [11]. The 8q24.21 region 
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harboring MYC, on the other hand, was altered in most 
histotypes other than mucinous. TCGA study also indi- 
cates MYC is highly amplified in high grade serous 
tumors. This suggests that MYC inhibitors may be ap- 
plicable for these histotypes. For cancer genes which 
have been reported to have somatic or germline muta- 
tions in EOC, 3 were found to harbor copy deletions in 
serous histotype: PIK3R1, BRCAl, and STKll. For 
BRCAl, our finding was concordant with previous re- 
port that BRCAl locus could be lost via either deletion 
or epigenetic silencing other than mutation in sporadic 
EOC [31]. 

A number of the candidate drivers in Table 1 are also 
implicated in translocation aberrations, e.g. TPM3, 
BCL9, GMPS, ZNF384 and SS18L1. Its interesting that 
these genes were amplified in endometrioid and/or ser- 
ous tumors. TPM3 and BCL9 reside in lq21, a frequent 
site for chromosomal rearrangements. TPM3 was specif- 
ically amplified in endometrioid tumors and the gene has 
been shown to constitute a fusion gene with NTRKl which 
belongs to the group of TRK oncogenes reported for papil- 
lary thyroid carcinoma [32]. Interestingly, NTRKl is also 
significantly amplified only in endometrioid tumors and 



further investigation is required to ascertain if this is due 
to gene fusion. BCL9 is a novel oncogene in Wnt signaling 
pathway, playing a critical role in epithelial-mesenchymal 
transition in colon epithelium and adenocarcinomas 
[33,34]. Translocation of BCL9 has been reported with 
14q32 [35] and the gene was amplified in both endome- 
trioid and serous tumors. Translocations for GMPS, 
ZNF384, and SS18L1 were also found in leukemia and syn- 
ovial sarcomas [36-40] and all were amplified in serous 
tumors. 

There are several drugs targeting the genes, e.g. for 
ERBB2, inhibitors include Trastuzumab, Lapatinib, and 
Pertuzumab. Lately, a clinical trial on combination of Per- 
tuzumab, Trastuzumb and Docetaxel improved outcome 
of patients with HER2 positive metastatic breast cancer 
[41]. BRAF mutations are more common in low grade ser- 
ous while BRAF amplification is more common in high 
grade serous. Our data showed that it is a potential copy 
number driver and hence may be targetable by BRAF- 
inhibitors in serous tumors. Most BRAF inhibitors target 
various mutations and its efficacy on amplified BRAF is 
not yet well understood. A study has shown that BRAF 
amplified colorectal cancer cells acquired resistance to the 
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Figure 5 Scatter plot of ERBB2 gene expression and qPCR of 7 samples identified to be copy number altered in Datasetl. 4 ERBB2 
amplified samples (serous, mucinous, mucinous borderline, and clear cell) had higher expression than the 3 ERBB2 deleted serous samples. 
Significant correlation was observed between gene expression and qPCR (p=0.007). 
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MEKl/2 inhibitors selumetinib [42]. PIK3CA is signifi- 
cantly amplified in serous histotypes and could be a poten- 
tial target for PI3K inhibitors. In a study of PI3K inhibitor 
on breast and gynecologic malignancies harboring PIK3CA 
mutations, patients with the mutations treated with the in- 
hibitor showed higher response rate than patients without 
the mutations [43] . 

In combining the 3 datasets, there was concern with 
regards to the genetic diversity amongst the Chinese 
Japanese, and Caucasian samples. The Hapmap [44] and 
Human Genome Diversity projects [45] have showed 
that these ethnic groups are different, though Chinese 
(CHB) and Japanese (JPT) tend to have high similarity in 
population structure. As genetic differences can be eval- 
uated via principal component analysis (PCA) [46], we 
used PCA to assess the copy number data of the 3 
cohorts. No distinct clustering between the groups 
(Additional file 7: Figure S3) was observed, suggesting 
that in this particular copy number landscape, the gen- 
etic effect is not evident and therefore has minimal effect 
in the analyses. We also used ANOVA test to assess 
-200 housekeeping genes between the 3 datasets; none 
of the genes showed significance (Additional file 8: Table 
S5). Note that ERBB2 also did not show any significance. 
Nevertheless, genetic differences were taken into consid- 
eration in the preprocessing protocol. Individual dataset 
was normalized with respect to the relevant ethnic group 
from Hapmap data, i.e. Datasetl with JPT, and Dataset2 
with CHB. Ethnic-specific common structural poly- 
morphism was also filtered out (see Data analysis) to en- 
sure the copy number alterations identified in this study 
are de novo alterations in tumors. 

We recognize that the regions identified could still be 
limited by the individual sample size of the histotypes. 
The larger number of copy number altered genes in ser- 
ous tumors could be attributed to the larger sample size 
in this collection. We performed sub-sampling analyses 
to ascertain the effects and in addition, to ensure robust- 
ness of results, we used stringent criteria to filter the 
regions as well as criteria to consider CNA genes if it 
were supported by at least 2 dataset (Additional file 1: 
Figure SI). The flip side of this filtering was that true 
regions of alterations could be filtered out (as shown in 
the green area in Figure 1), leading to probably more 
false negatives. Nevertheless, we observed that despite 
the filtering and limited sample size of some histotypes, 
significant regions were still observed in the less preva- 
lent histotypes; e.g. the lp36.33, 2pll.l, 19ql3.31, and 
20ql3.33 amplification and 9q32 deletion in clear cell 
tumors (n=29); lq21.2-3 amplification in endometrioid 
tumors (n=20); 17ql2 amplification and p24.1 deletion 
in mucinous tumors (n=19). Note that endometrioid 
tumors were not available in Dataset 2 although the total 
number of tumors was comparable with clear cell and 



mucinous. The concordance criteria of agreement on 2 
datasets in the analytical workflow would thus bias the 
identification of regions for this histotype. Despite this, 
significant alterations were still observed for endome- 
troid (e.g. TPM3) and given the stringent criteria; these 
are likely high confidence alterations. It should be noted 
that the samples were stratified according to the 4 main 
histotypes, including some borderline cases. Although 
borderline cases are presented clinically as a different 
subtype, they were included to simplif)^ the stratification 
of histotypes and analyses. The significance of this ap- 
proach can be seen in ERBB2, where both mucinous and 
mucinous borderline cases harbor amplification and corre- 
sponding up regulation of expression as well. This was 
similarly observed in other studies [26-28]. To assess if copy 
number alterations differ between borderline and non- 
borderline tumors and would thus cause bias in our ana- 
lyses, we evaluated PCA of these samples (Additional file 9: 
Figure S4). No distinct clustering was observed between the 
borderline and non-borderline groups. 

Conclusions 

In summary, our study showed genomic diversity in 
EOC and highlighted distinct copy number alterations in 
histotypes that may have potential for drug targeted 
therapy. ERBB2 is significantly amplified in mucinous 
tumors and is a candidate copy number driver gene. By 
merging multiple datasets of similar platform, we 
demonstrated that CNA in the lower prevalent histo- 
types could be elucidated, even with limited sample size. 

Methods 

Datasetl 

56 archived frozen tumor samples from the Department 
of Gynecology & Obstetrics, Kyoto University Graduate 
School of Medicine, Japan were profiled on microarray. It 
contained 12 clear cell carcinoma, 6 endometrioid adeno- 
carcinoma, 2 mucinous adenocarcinoma, 5 mucinous- 
borderline tumors, 26 serous adenocarcinoma, and 5 
serous-borderline tumors. 

Dataset2 

46 archived frozen tumor samples collected from 
Department of Obstetrics and Gynecology, Tri-Service 
General Hospital, Taiwan, containing 9 clear cell, 6 mu- 
cinous, and 31 serous. 

Datasets 

GSE19539 consisting of 8 clear cell, 14 endometrioid, 6 
mucinous, and 39 serous [15]. Blood normal available in 
the dataset was used for normalization in concordance 
with the paper. 
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Human ovarian carcinoma samples 

Two collections of archived flash fi:ozen ovarian carcinoma 
samples were obtained fi:om the Department of Gynecology 
& Obstetrics, Kyoto University Graduate School of Medi- 
cine, Japan (Datasetl, N= 56) and Department of Obstetrics 
and Gynecology, Tri-Service General Hospital, Taiwan 
(Dataset2, N=46). All samples were collected with the 
donors written informed consent. Ethical clearance has 
been approved by the Institutional Review Board for both 
institutes. All samples were reviewed by at least one path- 
ologist from the respective institutes on histopathological 
typing and purity of samples. Tumor genomic DNAs were 
extracted by using phenol-chloroform extraction method. 
Tumor RNAs were extracted by using Qiazol followed by 
column clean-up using miRNeasy kit (Qiagen). 

Copy number profiling 

Affymetrix Genome-Wide Human SNP Arrays 6.0 
(Affymetrix, Santa Clara, California) were used for copy 
number analysis according to the cytogenetics protocol 
from the manufacturer. Data was pre-processed and nor- 
malized with Hapmap JPT or CHB for Japan and Taiwan 
samples respectively using the Affymetric Genotyping 
Console. Copy number segments were obtained from the 
circular binary segmentation (CBS) algorithm [47] imple- 
mented in R package DNAcopy using default settings. 

Gene expression profiling 

Affymetrix GeneChip Human Gene LOST Array was 
used for gene expression analysis according to the proto- 
cols from the manufacturer. Data was pre-processed and 
RMA normalized [48] using Affymetrix Gene Expression 
Console. Expressions for genes were mean-aggregated 
for each gene based on Affymetrix probes annotation. 
Note: 3 mucinous samples in Dataset2 and 2 serous 
samples in DatasetS do not have corresponding gene ex- 
pression data. 

Data repository 

The gene expression and copy number datasets are 
MIAME compliant and have been submitted to National 
Centre for Biotechnology Informations (NCBI) Gene 
Expression Omnibus (GEO) website, series accession num- 
ber GSE30311. 

Quantitative real time PGR 

Total and miRNA was isolated from the ovarian carcin- 
oma tissues using the miRNeasy kit (QIAGEN), of which 
SOOng were used to generate cDNA using the RT^ first 
strand kit (QIAGEN). For the qPCR run, 200ng of first 
strand cDNA was used per gene analysis. To determine 
the expression profile, ERBB2 transcript expression 
levels were normalized against the averaged expression 



levels of 5 housekeeping genes (ACTB, B2M, GADPH, 
HPRT and RPL13A). 

Delta-Ct (ACt) and fold-change determination 

Ct was determined using the SDS software (version 2.3, 
Applied Biosystems). Briefly, Ct values were determined by 
setting the baseline between cycle 2 of the run (total run: 
40 cycles) and 2 cycles before the start of the first log- 
phase amplification. The threshold was set by positioning 
the limit to the lower third of the earliest amplification. 
ACt was calculated by the formulae: 

ACt = Ct(GOI)-Ct(HKG) 

whereby: Ct (GOI): Ct value of the respective gene of inter- 
est (GOI), Ct (HKG): average Ct values of the 5 house- 
keeping genes (HKG) used in the assay. Fold-change of 
the transcript is determined by the following formula: 

Fold - change = 2^-^^^^ 
Data analysis 

To identify significant copy number altered regions, we 
used a 2-pronged workflow employing the GISTIC algo- 
rithm [18]. GISTIC identifies copy number alterations 
based on the frequency as well as the log relative ratio 
(LRR) signals to compute the q value (false discovery 
rate). Default settings were used in the GISTIC analysis, 
and amplification and deletion thresholds were set at 0.2 
and -0.2 respectively. Additional file 1: Figure SI shows 
the 2-pronged workflow involving merged and individual 
copy number datasets to identify copy number alterations. 
Alterations were considered significant if it passed the 
following filtering criteria: (i) q < 0.25 (individual dataset), 
(ii) q < 0.05 (merged dataset), and (iii) concordance in 2 or 
more datasets. The significant regions were than mapped 
to genes (hgl8 Refseq) by averaging the segments within 
each gene. ANOVA test was used to identify histotype- 
specific alterations. The analyses resulted in a list of sig- 
nificant gain and loss genes for each histotype, summar- 
ized in Figure 2 and Table 1 (known cancer genes). 

To identify potential driver genes, non-parametric Spear- 
man correlation was used to assess association between 
gene expression and copy number alterations of individual 
gene for each dataset (Additional file 1: Figure SI). Fishers 
combined probability test (meta-p) [49] was then used to 
combine the correlation statistics from each dataset to 
identify potential driver genes. This hypothesis driven as- 
sociation approach has been used to identify potential can- 
cer driver genes [50,51]. Potential driver genes of known 
cancer genes are listed in Table 1 (in bold). 

PCA plots were generated using Partek Genomics 
Suite (Partek, Missouri, USA). Pathway analyses were 
performed using Ingenuity Pathway Analysis software 
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(Ingenuity, California, USA). The frequency plot for 
copy number altered regions in Figure 1 was generated 
using the threshold of LRR > |0.2|. All statistical analyses 
and plots were done using the R programming package 
(http: / / www. r-proj ect.org) . 

Sub-sampling analyses to ascertain effects of sample size 

To assess the effects of disparate sample size of the his- 
totypes in the merged copy number data, multiple sub- 
sampling (with replacement) on the merged serous 
tumors was performed to ascertain the false positive and 
negative. The results showed that >97% of genes identi- 
fied in sample size of 20-30 were also found in sample 
size of 101. However, 43-57% of genes found in the lar- 
ger sample size were not identified in the smaller sample 
size datasets. In view of this, we have mainly confined 
comparison of genes found in the non-serous tumors. 

Additional files 
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